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Abstract. Wall Street English* has built online activities that allow students to 
record phrases and receive word-level Automatic Speech Recognition (ASR) 
driven pronunciation feedback. Students in language centres in China, Vietnam, 
Saudi Arabia, and Italy (N=2,867) used ASR-Computer Assisted Language 
Learning (CALL) activities, and some (N=482) completed a questionnaire. 
A high number of students reported that ASR-CALL activities helped them to 
improve their pronunciation. However, the study found remarkable differences 
in usage of product features across countries, with students from Vietnam and 
China using more retries than Saudi Arabia, and students from Italy using the 
fewest retries. Students from China, Vietnam, and Saudi Arabia more frequently 
listened to model audios than students from Italy. A series of Kruskal-Wallis 
tests revealed significant group differences between dominant L1 and students’ 
beliefs and perceptions using ASR, and between age groups and students’ beliefs 
and perceptions using ASR. This study points to the importance of considering 
regional differences, and suggests that learner engagement may depend not only 
on the effectiveness of the technology, but also on learner beliefs and perceptions. 


Keywords: ASR, speech recognition, pronunciation, pronunciation feedback, 
learner beliefs and perceptions. 
1. Introduction 


ASR-CALL activities offer considerable opportunities for individualised practice 
and personalised feedback on pronunciation (Levis, 2007), and recent studies 
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demonstrate that ASR-CALL activities can have a measurable impact on learning 
(inter alia, Golonka et al., 2014). 


Perceptions regarding the difficulty of achieving English pronunciation skills may 
be linked to L1/nationality (cf. Cenoz & Lecumberri, 1999; Simon & Taverniers, 
2011), and while there are commonalities, beliefs regarding effective learning 
strategies have been found to differ according to learner L1/nationality (Nowacka, 
2012). Such beliefs may lead to pre-conceptions relating to the effectiveness of 
ASR-CALL activities, which may affect student engagement. This study explores 
adult students’ engagement with ASR-CALL activities and aims to address the 
following research questions. 


¢ Do students think pronunciation activities with ASR help them improve 
their pronunciation? 


¢ Do students in four countries (China, Vietnam, Italy, and Saudi Arabia) 
make different use of ASR-CALL activity features? 


e Are there differences between L1s and age groups in students’ beliefs and 
perceptions on learning pronunciation using ASR? 


2. Method 


2.1. Context and participants 


Wall Street English uses proprietary learning content to deliver a blended-learning 
course, offering a combination of multimedia self-study lessons and face-to-face 
teacher-led classes. It incorporated ASR word-level pronunciation feedback into 
three multimedia activity types: (1) repeat and practise, (2) read and record, and 
(3) conversation. 


Students were exposed to these ASR activities as part of their course over six 
weeks. Researchers had access to anonymised back-end data at the end. Participants 
(N=2,867) completed ASR-powered course activities, and 482 of these responded 
to an optional, anonymous online questionnaire. There was an unequal age range 
distribution in questionnaire participants in terms of the most common age groups: 
19-40 in China, 16-40 in Vietnam, 16-30 in Saudi Arabia, and 23-60 in Italy 
(Table 1). 
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Table 1. Participants 
China Vietnam Saudi Italy Total 


N % N % N % N % N 
Studied activities 1,153 40% |1,192 42% (171 (6% 351 12% | 2,867 
with ASR 


Took the 153) |32% (|173 | 36% 85 18% |71 15% 482 
questionnaire 
2.2. Instruments 


Learners completed the activities as part of their normal studies. They were familiar 
with them, only ASR feedback was new. Questionnaire items (see Table 2 and 
Table 3) were based on previous research. A six-step Likert scale (from 1=not at all 
to 6=fully agree) was used. 


3. Results and discussion 


Ninety-five percent of students believed that pronunciation activities with 
ASR helped improve their pronunciation. Vietnamese students were the most 
enthusiastic (98.8%), closely followed by students in Italy (98.5%), then in Saudi 
Arabia (95.2%), and, finally, students in China (91.5%). 


Concerning the usage of ASR activity features, students in all territories used a much 
lower number of attempts than they were allowed. Students in China, Saudi Arabia, 
and Italy only used one attempt (Mdn=1) and those in Vietnam used two (Mdn=2), 
whereas they were allowed three attempts in Activities 1 and 2, and four in 3. For 
listening to model audio recordings, students in China and Vietnam reported using 
this feature the most (82%), less so for students in Saudi Arabia (57%) and in Italy 
(41%). Finally, students in Vietnam reported listening to own utterance recordings 
the most (80%), less so for students in China (66%) and in Saudi Arabia (61%), and 
remarkably lower usage was reported by students in Italy (57%). 


Finally, we investigated potential between-group differences in age groups and Lls 
in students’ beliefs and perceptions about L2 pronunciation items. The analyses for 
age revealed statistically significant differences between age groups and all items but 
one (Item 4 — see Table 2). Differences were favourable to the youngest age group 
(16-22), suggesting that older learners may have less overall confidence in their 
ability to acquire pronunciation skills (Marinova-Todd, Marshall, & Snow, 2000). 
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Table 2. Kruskal-Wallis test results for beliefs and perceptions about L2 
pronunciation and age ranges 


# | Items N M(SD) Mean | df Wa P Effect 
Rank Size 
d 
1 I believe that I will 482 2 49 .000* | 0.66 
eventually be able 
to speak English 
very well. 
16-22 167 | 5.26(.83) | 268.45 
23-30 153. | 4.97(1.21) | 243.62 
31-60+ 162 | 4.75(1.18) | 211.72 
2 |Icanusetechnology (482 2 14.4 | .001* 0.32 


to help me improve 
my pronunciation. 


16-22 167 | 5.22(.93) | 272.34 
23-30 153 | 4.73(1.23) | 219.60 
31-60+ 162 | 4.80(1.27) | 230.39 
3 | I feel at ease when 482 2 19.3. | .000* | 0.03 
I have to speak 
English. 
16-22 167 |4.37(1.27) 278.53 
23-30 153 | 3.83(1.42) | 227.26 
31-60+ 162 | 3.74(1.39) | 216.77 
4 |1I feel insecure about 482 2 a] 624 
my pronunciation. 
16-22 167 | 3.67(1.55) | 248.46 
23-30 153 | 3.60(1.57) | 242.02 
31-60+ 162 | 3.53(1.39) 233.83 
5 |Itis important forme | 482 2 21 .000* | 0.12 


to speak English with 
an excellent English 


pronunciation. 
16-22 167 | 5.64(.90) | 269.93 
23-30 153 | 5.43(1.02) | 241.87 
31-60+ 162 | 5.24(1.05) 211.85 
6 |lam happy with my 482 2 8.4 .015* | 0.23 


pronunciation as 
long as people can 
understand me. 


16-22 167 | 3.76(1.60) 259.58 
23-30 153 | 3.21(1.78) | 216.05 
31-60+ 162 | 3.59(1.55) 246.90 


* p<0.05, d.,,,., “0-2 (small), 0.5 (medium), 0.8 (large) 
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Results for L1 analyses are displayed in Table 3. The analyses yielded statistically 
significant differences between LIs and all beliefs and perceptions about L2 
pronunciation statements, in line with some findings reported in Nowacka 
(2012). 


The effect sizes were noticeably larger than for age groups, which may help to 
explain some differences found in feature usage. For example, students from Italy, 
who reported the lowest instances of listening to both the sample and their own 
recorded audios, also demonstrated lowest self-belief in ability to be able to speak 
English very well, and placed lowest importance on being able to speak with 
excellent pronunciation. 


Table 3. Kruskal-Wallis test results for beliefs and perceptions about L2 
pronunciation with L1s 


# | Items N M(SD) Mean |df | 7’ p Effect 
Rank Size 
d 
1 | I believe that I will 482 3. | 29.5 | .000* | 0.48 
eventually be able to 
speak English very well. 
Chinese 153 4.83(1.37) | 234.89 
Vietnamese 173 | 5.20(.85) | 259.07 
Arabic 85 | 5.29(.89) | 276.22 
Italian 71 | 4.52(.99) | 171.37 
2 | Ican use technology 482 3 19.1 | .000* | 0.37 


to help me improve 
my pronunciation. 


Chinese 153 | 4.58(1.42) | 211.92 
Vietnamese 173 | 5.19(.92) | 266.75 
Arabic 85 5.12(1.05) | 264.54 
Italian 71 4.77(1.07) | 216.13 
3 I feel at ease when I have | 482 3 37.9 | .000* | 0.56 
to speak English. 
Chinese 153 | 3.70(1.52) | 217.84 
Vietnamese 173. | 4.47(1.21) | 289.59 
Arabic 85 3.92(1.39) | 231.39 
Italian 71 3.49(1.09) | 187.42 
4 ‘I feel insecure about 482 3 13.9 | .003* | 0.30 
my pronunciation. 
Chinese 153 3.47(1.67) | 228.31 
Vietnamese 173 | 3.93(1.35) 270.12 
Arabic 85 3.22(1.66) | 208.49 
Italian 71 3.60(1.35) | 239.70 
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5 It is important for me 482 3 113.1 | .000* | 1.09 
to speak English with 
an excellent English 


pronunciation. 
Chinese 153. 5.35(1.13) | 236.86 
Vietnamese 173 |5.71(.68) | 269.73 
Arabic 85 | 5.88(.42) | 294.76 
Italian 71 =| 4.45(1.16) | 118.96 
6 [lam happy with my 482 3 77.1 | .000* | 0.85 


pronunciation as long as 
people can understand me. 


Chinese 153 | 2.62(1.63) | 168.05 
Vietnamese 173 | 3.76(1.52) 251.76 
Arabic 85 | 4.43(1.51) 315.31 
Italian 71 4.09(1.19) | 286.42 


* p<0.05, d_,,,., 0-2 (small), 0.5 (medium), 0.8 (large) 


4. Conclusions 


Learners were overwhelmingly positive towards ASR-CALL’s potential in helping 
improve their pronunciation. However, differences in feature usage were observed 
between students of different L1s/nationalities, which may be related to differences 
in learner beliefs. 
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